-
Notifications
You must be signed in to change notification settings - Fork 65
fix: fix rag example #283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix rag example #283
Conversation
Summary of ChangesHello @e06084, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves issues in the RAG (Retrieval Augmented Generation) examples by adapting them to recent API changes in the evaluation framework. The updates ensure that evaluation results are correctly processed and displayed, and enhance file path handling for test datasets, making the examples functional and aligned with the current library specifications. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes the RAG examples by updating them to use the new API for evaluation results, where eval_status is renamed to status and eval_details is removed in favor of using the result object directly. The changes are correct and consistent across all example files.
I've added a few suggestions for examples/rag/sdk_rag_eval_batch_dataset.py to improve maintainability by addressing code duplication, a misleading variable name, and inconsistent logging. These changes would make the example code cleaner and easier to understand.
|
|
||
| # 输入文件路径配置 | ||
| CSV_FILE_PATH = "ragflow_eval_data_50.jsonl" # 支持CSV和JSONL格式 | ||
| CSV_FILE_PATH = Path("test/data/ragflow_eval_data_50.jsonl") # 支持CSV和JSONL格式 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name CSV_FILE_PATH is misleading as the comment and the code logic indicate that it can also be a .jsonl file. To improve clarity, consider renaming it to something more generic like INPUT_FILE_PATH. Note that this will require updating its usages in the main function as well.
| CSV_FILE_PATH = Path("test/data/ragflow_eval_data_50.jsonl") # 支持CSV和JSONL格式 | |
| INPUT_FILE_PATH = Path("test/data/ragflow_eval_data_50.jsonl") # 支持CSV和JSONL格式 |
| print("\n1. 忠实度 (Faithfulness):") | ||
| faithfulness_result = LLMRAGFaithfulness.eval(data) | ||
| print(f" 状态: {'✅ 通过' if not faithfulness_result.eval_status else '❌ 未通过'}") | ||
| print(f" 状态: {'✅ 通过' if not faithfulness_result.status else '❌ 未通过'}") | ||
| print(f" 分数: {faithfulness_result.score}/10") | ||
| total_faithfulness += faithfulness_result.score | ||
|
|
||
| logger.info("\n2. 上下文精度 (Context Precision):") | ||
| print("\n2. 上下文精度 (Context Precision):") | ||
| precision_result = LLMRAGContextPrecision.eval(data) | ||
| logger.info(f" 状态: {'✅ 通过' if not precision_result.eval_status else '❌ 未通过'}") | ||
| logger.info(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | ||
| logger.info(f" 分数: {precision_result.score}/10") | ||
| print(f" 状态: {'✅ 通过' if not precision_result.eval_status else '❌ 未通过'}") | ||
| print(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | ||
| print(f" 分数: {precision_result.score}/10") | ||
| total_precision += precision_result.score | ||
|
|
||
| print("\n3. 上下文召回 (Context Recall):") | ||
| recall_result = LLMRAGContextRecall.eval(data) | ||
| print(f" 状态: {'✅ 通过' if not recall_result.eval_status else '❌ 未通过'}") | ||
| print(f" 状态: {'✅ 通过' if not recall_result.status else '❌ 未通过'}") | ||
| print(f" 分数: {recall_result.score}/10") | ||
| total_recall += recall_result.score | ||
|
|
||
| print("\n4. 上下文相关性 (Context Relevancy):") | ||
| relevancy_result = LLMRAGContextRelevancy.eval(data) | ||
| print(f" 状态: {'✅ 通过' if not relevancy_result.eval_status else '❌ 未通过'}") | ||
| print(f" 状态: {'✅ 通过' if not relevancy_result.status else '❌ 未通过'}") | ||
| print(f" 分数: {relevancy_result.score}/10") | ||
| total_relevancy += relevancy_result.score | ||
| # | ||
| print("\n5. 答案相关性 (Answer Relevancy):") | ||
| answer_relevancy_result = LLMRAGAnswerRelevancy.eval(data) | ||
| print(f" 状态: {'✅ 通过' if not answer_relevancy_result.eval_status else '❌ 未通过'}") | ||
| print(f" 状态: {'✅ 通过' if not answer_relevancy_result.status else '❌ 未通过'}") | ||
| print(f" 分数: {answer_relevancy_result.score}/10") | ||
| total_answer_relevancy += answer_relevancy_result.score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block of code for evaluating metrics is almost identical to the one in the evaluate_from_csv function (lines 271-302). This code duplication makes the code harder to maintain, as any change in the evaluation logic needs to be applied in two places. Consider extracting this logic into a separate helper function to improve maintainability and reduce redundancy.
| logger.info("\n2. 上下文精度 (Context Precision):") | ||
| print("\n2. 上下文精度 (Context Precision):") | ||
| precision_result = LLMRAGContextPrecision.eval(data) | ||
| logger.info(f" 状态: {'✅ 通过' if not precision_result.eval_status else '❌ 未通过'}") | ||
| logger.info(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | ||
| logger.info(f" 分数: {precision_result.score}/10") | ||
| print(f" 状态: {'✅ 通过' if not precision_result.eval_status else '❌ 未通过'}") | ||
| print(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | ||
| print(f" 分数: {precision_result.score}/10") | ||
| total_precision += precision_result.score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging for 'Context Precision' is inconsistent with other metrics in this loop. It logs to both the logger and stdout, and the calls are duplicated, whereas other metrics only print to stdout. For consistency and to remove duplication, I suggest using only print here, similar to the other metrics.
| logger.info("\n2. 上下文精度 (Context Precision):") | |
| print("\n2. 上下文精度 (Context Precision):") | |
| precision_result = LLMRAGContextPrecision.eval(data) | |
| logger.info(f" 状态: {'✅ 通过' if not precision_result.eval_status else '❌ 未通过'}") | |
| logger.info(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | |
| logger.info(f" 分数: {precision_result.score}/10") | |
| print(f" 状态: {'✅ 通过' if not precision_result.eval_status else '❌ 未通过'}") | |
| print(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | |
| print(f" 分数: {precision_result.score}/10") | |
| total_precision += precision_result.score | |
| print("\n2. 上下文精度 (Context Precision):") | |
| precision_result = LLMRAGContextPrecision.eval(data) | |
| print(f" 状态: {'✅ 通过' if not precision_result.status else '❌ 未通过'}") | |
| print(f" 分数: {precision_result.score}/10") | |
| total_precision += precision_result.score |
No description provided.